a bite-size comparison of two groups
4/26/23
Build a reproducible script for differential gene expression (DGE) analysis of Montipora capitata rice coral samples using a reference genome
M. capitata development diagram by A. Huffmyer
What I need to do…
Transcript Quantification (Alignment) with HISAT2
Differential Expression Analysis HISAT2& DESeq2
And the result was this count matrix that I don’t trust:
SRR22293447 SRR22293448 SRR22293449
Montipora_capitata_HIv3___RNAseq.g16177.t2a 0 0 0
Montipora_capitata_HIv3___TS.g32517.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g13366.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g226.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g39492.t1 66 66 72
Montipora_capitata_HIv3___TS.g4674.t2b 13 8 8
SRR22293450 SRR22293451 SRR22293452
Montipora_capitata_HIv3___RNAseq.g16177.t2a 0 3 1
Montipora_capitata_HIv3___TS.g32517.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g13366.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g226.t1 0 0 0
Montipora_capitata_HIv3___RNAseq.g39492.t1 78 82 67
Montipora_capitata_HIv3___TS.g4674.t2b 3 2 6
SRR22293453 SRR22293454
Montipora_capitata_HIv3___RNAseq.g16177.t2a 6 2
Montipora_capitata_HIv3___TS.g32517.t1 0 0
Montipora_capitata_HIv3___RNAseq.g13366.t1 0 0
Montipora_capitata_HIv3___RNAseq.g226.t1 0 0
Montipora_capitata_HIv3___RNAseq.g39492.t1 103 65
Montipora_capitata_HIv3___TS.g4674.t2b 7 1
The DESeqDataSet object, named deseq.dds, is then passed to the DESeq() function to fit a negative binomial model and estimate dispersions.
Formal class 'DESeqResults' [package "DESeq2"] with 7 slots
..@ priorInfo : list()
..@ rownames : chr [1:54384] "Montipora_capitata_HIv3___RNAseq.10136_t" "Montipora_capitata_HIv3___RNAseq.10187_t" "Montipora_capitata_HIv3___RNAseq.10207_t" "Montipora_capitata_HIv3___RNAseq.10214_t" ...
..@ nrows : int 54384
..@ elementType : chr "ANY"
..@ elementMetadata:Formal class 'DFrame' [package "S4Vectors"] with 6 slots
.. .. ..@ rownames : NULL
.. .. ..@ nrows : int 6
.. .. ..@ elementType : chr "ANY"
.. .. ..@ elementMetadata: NULL
.. .. ..@ metadata : list()
.. .. ..@ listData :List of 2
.. .. .. ..$ type : chr [1:6] "intermediate" "results" "results" "results" ...
.. .. .. ..$ description: chr [1:6] "mean of normalized counts for all samples" "log2 fold change (MLE): condition recruits vs embryos" "standard error: condition recruits vs embryos" "Wald statistic: condition recruits vs embryos" ...
..@ metadata :List of 6
.. ..$ filterThreshold: Named num 1.04
.. .. ..- attr(*, "names")= chr "63.21844%"
.. ..$ filterTheta : num 0.632
.. ..$ filterNumRej :'data.frame': 50 obs. of 2 variables:
.. .. ..$ theta : num [1:50] 0.327 0.34 0.353 0.365 0.378 ...
.. .. ..$ numRej: num [1:50] 5868 5868 5917 5950 6005 ...
.. ..$ lo.fit :List of 2
.. .. ..$ x: num [1:50] 0.327 0.34 0.353 0.365 0.378 ...
.. .. ..$ y: num [1:50] 5854 5887 5920 5953 5986 ...
.. ..$ alpha : num 0.1
.. ..$ lfcThreshold : num 0
..@ listData :List of 6
.. ..$ baseMean : num [1:54384] 13.45 2.99 2768.01 46.87 7.97 ...
.. ..$ log2FoldChange: num [1:54384] 4.442 0.107 -0.943 0.43 2.357 ...
.. ..$ lfcSE : num [1:54384] 0.826 1.175 0.104 0.243 0.64 ...
.. ..$ stat : num [1:54384] 5.3774 0.0912 -9.0699 1.7665 3.6819 ...
.. ..$ pvalue : num [1:54384] 7.56e-08 9.27e-01 1.19e-19 7.73e-02 2.31e-04 ...
.. ..$ padj : num [1:54384] 9.34e-07 9.61e-01 4.78e-18 1.83e-01 1.48e-03 ...
[1] 5736 6
Top 50 Differentially Expressed Genes
Week06
HISAT2 & align readsWeek07
HISAT2& DESeq2Week08
Debug all the inevitable mistakes
Clean up scripts, add helpful tips
Week09-10 If there’s time…
Go back and build scripts to work with raw sequence data:
Quality control with FastQC or MultiQC
Trim & Clean with fastp or trimgalore
Quality control trimmed sequences
Create HISAT2 index for the reference genome hisat2-build path/to/reference_genome.fasta path/to/hisat2_index/genome